Inner joins and full joins
You may have noticed that some of the songs in songs correspond to some of the albums in albums. Suppose you want a new dataset that contains all of the songs for which you have data from both albums and songs. How would you make it?
The artists and bandsdatasets also share some information. What if you want to join these two datasets in such a way that you retain all of the information available in both tables, without throwing anything away?
You can think of inner joins as the most strict type of join: they only retain observations that appear in both datasets. In contrast, full joins are the most permissive type of join: they return all of the data that appears in both datasets (often resulting in many missing values).
Recall that, *_join(x, y) joins y to x. The second dataset you specify is joined to the first dataset.
# Join albums to songs using inner_join()
songs %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
song
|
album
|
first
|
last
|
|
Come Together
|
Abbey Road
|
John
|
Lennon
|
|
Dream On
|
Aerosmith
|
Steven
|
Tyler
|
|
Hello, Goodbye
|
Magical Mystery Tour
|
Paul
|
McCartney
|
|
It’s Not Unusual
|
Along Came Jones
|
Tom
|
Jones
|
albums %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
album
|
band
|
year
|
|
A Hard Day’s Night
|
The Beatles
|
1964
|
|
Magical Mystery Tour
|
The Beatles
|
1967
|
|
Beggar’s Banquet
|
The Rolling Stones
|
1968
|
|
Abbey Road
|
The Beatles
|
1969
|
|
Led Zeppelin IV
|
Led Zeppelin
|
1971
|
|
The Dark Side of the Moon
|
Pink Floyd
|
1973
|
inner_join(songs, albums, by = "album") %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
song
|
album
|
first
|
last
|
band
|
year
|
|
Come Together
|
Abbey Road
|
John
|
Lennon
|
The Beatles
|
1969
|
|
Dream On
|
Aerosmith
|
Steven
|
Tyler
|
Aerosmith
|
1973
|
|
Hello, Goodbye
|
Magical Mystery Tour
|
Paul
|
McCartney
|
The Beatles
|
1967
|
# Join bands to artists using full_join()
artists %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
first
|
last
|
instrument
|
|
Jimmy
|
Buffett
|
Guitar
|
|
George
|
Harrison
|
Guitar
|
|
Mick
|
Jagger
|
Vocals
|
|
Tom
|
Jones
|
Vocals
|
|
Davy
|
Jones
|
Vocals
|
|
John
|
Lennon
|
Guitar
|
bands %>%
head() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
first
|
last
|
band
|
|
John
|
Bonham
|
Led Zeppelin
|
|
John Paul
|
Jones
|
Led Zeppelin
|
|
Jimmy
|
Page
|
Led Zeppelin
|
|
Robert
|
Plant
|
Led Zeppelin
|
|
George
|
Harrison
|
The Beatles
|
|
John
|
Lennon
|
The Beatles
|
full_join(artists, bands, by = c("first","last")) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
first
|
last
|
instrument
|
band
|
|
Jimmy
|
Buffett
|
Guitar
|
The Coral Reefers
|
|
George
|
Harrison
|
Guitar
|
The Beatles
|
|
Mick
|
Jagger
|
Vocals
|
The Rolling Stones
|
|
Tom
|
Jones
|
Vocals
|
NA
|
|
Davy
|
Jones
|
Vocals
|
NA
|
|
John
|
Lennon
|
Guitar
|
The Beatles
|
|
Paul
|
McCartney
|
Bass
|
The Beatles
|
|
Jimmy
|
Page
|
Guitar
|
Led Zeppelin
|
|
Joe
|
Perry
|
Guitar
|
NA
|
|
Elvis
|
Presley
|
Vocals
|
NA
|
|
Keith
|
Richards
|
Guitar
|
The Rolling Stones
|
|
Paul
|
Simon
|
Guitar
|
NA
|
|
Ringo
|
Starr
|
Drums
|
The Beatles
|
|
Joe
|
Walsh
|
Guitar
|
NA
|
|
Brian
|
Wilson
|
Vocals
|
NA
|
|
Nancy
|
Wilson
|
Vocals
|
NA
|
|
John
|
Bonham
|
NA
|
Led Zeppelin
|
|
John Paul
|
Jones
|
NA
|
Led Zeppelin
|
|
Robert
|
Plant
|
NA
|
Led Zeppelin
|
|
Charlie
|
Watts
|
NA
|
The Rolling Stones
|
|
Ronnie
|
Wood
|
NA
|
The Rolling Stones
|
Pipes
You can combine dplyr functions together with the pipe operator, %>%, to build up an analysis step-by-step. %>% takes the result of the code that comes before it and “pipes” it into the function that comes after it as the first argument of the function.
So for example, the two pieces of code below do the same thing:
full_join(artists, bands,
by = c("first", "last")) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
artists %>%
full_join(bands, by = c("first", "last"))
Pipes are so efficient for multi-step analysis that you will use them for the remainder of the exercises in this course.
# Find guitarists in bands dataset (don't change)
temp <- left_join(bands, artists, by = c("first", "last"))
temp <- filter(temp, instrument == "Guitar")
select(temp, first, last, band)
## # A tibble: 5 x 3
## first last band
## <chr> <chr> <chr>
## 1 Jimmy Page Led Zeppelin
## 2 George Harrison The Beatles
## 3 John Lennon The Beatles
## 4 Jimmy Buffett The Coral Reefers
## 5 Keith Richards The Rolling Stones
# Reproduce code above using pipes
bands %>%
left_join(artists, by = c("first", "last")) %>%
filter(instrument == "Guitar") %>%
select(first, last, band)
## # A tibble: 5 x 3
## first last band
## <chr> <chr> <chr>
## 1 Jimmy Page Led Zeppelin
## 2 George Harrison The Beatles
## 3 John Lennon The Beatles
## 4 Jimmy Buffett The Coral Reefers
## 5 Keith Richards The Rolling Stones
Practice with pipes and joins
We’ve created a data frame for you called goal. Your mission, if you choose to accept it, is to create a dataset that’s identical to goal.
# Examine the contents of the goal dataset
goal %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
first
|
last
|
instrument
|
band
|
song
|
album
|
|
Tom
|
Jones
|
Vocals
|
NA
|
It’s Not Unusual
|
Along Came Jones
|
|
John
|
Lennon
|
Guitar
|
The Beatles
|
Come Together
|
Abbey Road
|
|
Paul
|
McCartney
|
Bass
|
The Beatles
|
Hello, Goodbye
|
Magical Mystery Tour
|
# Create goal2 using full_join() and inner_join()
goal2 <- artists %>%
full_join(bands, by = c("first","last")) %>%
inner_join(songs, by = c("first","last"))
# Check that goal and goal2 are the same
setequal(goal, goal2)
## [1] TRUE
Remember that the order of the joins in your pipe determines the output of the pipe.
Choose your joins
You’re getting the hang of pipes now! They are a very useful way to combine multiple joins to make a single dataset.
# Create one table that combines all information
artists %>%
full_join(bands, by = c("first","last")) %>%
full_join(songs, by = c("first","last")) %>%
full_join(albums, by = c("album", "band")) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")
|
first
|
last
|
instrument
|
band
|
song
|
album
|
year
|
|
Jimmy
|
Buffett
|
Guitar
|
The Coral Reefers
|
NA
|
NA
|
NA
|
|
George
|
Harrison
|
Guitar
|
The Beatles
|
NA
|
NA
|
NA
|
|
Mick
|
Jagger
|
Vocals
|
The Rolling Stones
|
NA
|
NA
|
NA
|
|
Tom
|
Jones
|
Vocals
|
NA
|
It’s Not Unusual
|
Along Came Jones
|
NA
|
|
Davy
|
Jones
|
Vocals
|
NA
|
NA
|
NA
|
NA
|
|
John
|
Lennon
|
Guitar
|
The Beatles
|
Come Together
|
Abbey Road
|
1969
|
|
Paul
|
McCartney
|
Bass
|
The Beatles
|
Hello, Goodbye
|
Magical Mystery Tour
|
1967
|
|
Jimmy
|
Page
|
Guitar
|
Led Zeppelin
|
NA
|
NA
|
NA
|
|
Joe
|
Perry
|
Guitar
|
NA
|
NA
|
NA
|
NA
|
|
Elvis
|
Presley
|
Vocals
|
NA
|
NA
|
NA
|
NA
|
|
Keith
|
Richards
|
Guitar
|
The Rolling Stones
|
NA
|
NA
|
NA
|
|
Paul
|
Simon
|
Guitar
|
NA
|
NA
|
NA
|
NA
|
|
Ringo
|
Starr
|
Drums
|
The Beatles
|
NA
|
NA
|
NA
|
|
Joe
|
Walsh
|
Guitar
|
NA
|
NA
|
NA
|
NA
|
|
Brian
|
Wilson
|
Vocals
|
NA
|
NA
|
NA
|
NA
|
|
Nancy
|
Wilson
|
Vocals
|
NA
|
NA
|
NA
|
NA
|
|
John
|
Bonham
|
NA
|
Led Zeppelin
|
NA
|
NA
|
NA
|
|
John Paul
|
Jones
|
NA
|
Led Zeppelin
|
NA
|
NA
|
NA
|
|
Robert
|
Plant
|
NA
|
Led Zeppelin
|
NA
|
NA
|
NA
|
|
Charlie
|
Watts
|
NA
|
The Rolling Stones
|
NA
|
NA
|
NA
|
|
Ronnie
|
Wood
|
NA
|
The Rolling Stones
|
NA
|
NA
|
NA
|
|
Steven
|
Tyler
|
NA
|
NA
|
Dream On
|
Aerosmith
|
NA
|
|
NA
|
NA
|
NA
|
The Beatles
|
NA
|
A Hard Day’s Night
|
1964
|
|
NA
|
NA
|
NA
|
The Rolling Stones
|
NA
|
Beggar’s Banquet
|
1968
|
|
NA
|
NA
|
NA
|
Led Zeppelin
|
NA
|
Led Zeppelin IV
|
1971
|
|
NA
|
NA
|
NA
|
Pink Floyd
|
NA
|
The Dark Side of the Moon
|
1973
|
|
NA
|
NA
|
NA
|
Aerosmith
|
NA
|
Aerosmith
|
1973
|
|
NA
|
NA
|
NA
|
Fleetwood Mac
|
NA
|
Rumours
|
1977
|
|
NA
|
NA
|
NA
|
Eagles
|
NA
|
Hotel California
|
1982
|